1 Objectives

  • Learn more about R and how to inspect objects and data types
  • Use the soilDB package to load NASIS pedon data into R
  • Learn about the checks run on data loaded by the fetchNASIS() function
  • Understand the structure of data stored in a Soil Profile Collection (SPC)
  • Learn ways to filter and subset SPC data in R
  • Learn how functions can be used to bundle operations
  • Review additional data that is accessible via extended data functions

2 The soilDB Package

What if you could extract, organize, and visualize data from NASIS and many other commonly used soil database sources with a couple of lines of code?

2.1 soilDB Functions

2.2 Importance of Pedon Data

The importance of pedon data for present and future work cannot be overstated. These data represent decades of on-the-ground observations of the soil resource for a given area. As difficult as it may be to take the time to enter legacy pedon data, it is vitally important that we capture this resource and get these data into NASIS as an archive of point observations.

2.3 Some Issues With Pedon Data

  • Making and documenting observations of soil requires hard work. Digging is difficult, and writing soil descriptions is time consuming!
  • Our confidence in observations commonly weakens with the depth of the material described.

    • If we acknowledge this, which we must, then how do we deal with it in pedon data?
      • Use a cutoff depth, for example 100 cm, can be used to truncate observations to a zone of greater confidence.
      • Show the relative confidence of the data with depth.

2.4 Common Challenges in Working with Pedon Data

  • Consistency
    • Missing data
  • Confidence in the observations
    • Uncertainty with depth
  • Description style differences
    • Depth described, horizonation usage styles
  • Legacy data vintage
    • Decadal span of data
    • Taxonomy updates, horizon nomenclature
  • Location confidence
    • Origin of the location information
    • Datum used for data collection
    • Accuracy for GPS values at the time of data collection

2.4.1 Meeting the Challenges

For more information regarding difficult pedon data, see the following tutorial in the “aqp” package:
Dealing with Troublesome data.

3 Using the soilDB Package

3.1 How does it work and what does it do?

The soilDB package for R works with soil-resource-related data sources. It has a series of convenience functions for accessing data in NASIS, KSSL, SDA, and other sources. The fetchNASIS convenience function extracts data from a NASIS selected set via Structured Query Language (SQL). Basic data checks are run within the fetch functions, then the data are assembled into a combined site-level and horizon-level data structure within a custom R object called a Soil Profile Collection (SPC). The SoilProfileCollection class simplifies the process of working with collections of data associated with soil profiles, e.g., site-level data, horizon-level data, spatial data, diagnostic horizon data, metadata, etc.. The SPC object inerits many of the familiar methods that operate on vectors and data.frame objects, such as nrow() (“how many horizons”) or length() (“how many pedons”).

Note that the import process in fetchNASIS() is not comprehensive. It does not pull all of the data for every table related to pedon data out of NASIS. Instead, it pulls much of the most commonly used pedon and horizon data. In addition, much of the nested complexity of the NASIS data structure is simplified in the resulting SPC object. Higher level functions like fetchNASIS() bundle a series of lower level functions that get specific parts of the data structure.

One-to-many relationships are flattened where possible by fetchNASIS(). This flattening aggregates the data into one site record with related horizon records. Selected additional data elements that may have a one-to-many relationship to a site or pedon can be gathered from a NASIS selected set via the get_extended_data_from_NASIS_db() function and then joined to the site-level data. More on this process in later…..

In short, the soilDB package greatly simplifies the process of getting pedon data from NASIS into R for further analysis.

3.2 Set Up an Open Database Connectivity (ODBC) Connection to NASIS

After setting up an ODBC connection, you can use R to access data from a selected set defined in your local NASIS database. See this job aid: How to Create an ODBC Connection and Setup SoilDB for Use with R.

Query and load some pedon data into your NASIS selected set.

Does NASIS need to be open and running to query data using soilDB?

No, fetchNASIS() works whether the NASIS application is running or not. You just need to make sure that the data you want has been loaded into your selected set.

3.3 Prepare Example Data

Take a moment to open the NASIS client and create a selected set with some site/pedon objects that will be used in the following sections. Using a query that includes both site and pedon objects, download to your local database and selected set sites with MT647 in the user site ID. Depending on the query, you will need to include wildcard characters like this: %MT647%.

3.3.1 Data Checks Run by the fetchNASIS() Function

When you load pedons using the fetchNASIS() function, the following data checks are performed:

  • Presence of multiple map datums. Results reported to the user and the data are not modified.

  • Inconsistent horizon boundaries. Pedons with inconsistent horizon boundaries are not loaded. In most cases, this occurs when the bottom depth of a horizon is not the same as the upper depth of the next lower horizon.

##   hzname top bot
## 1      A   0  30
## 2    Bt1  38  56
## 3    Bt2  56 121
## 4     Bk 121 135
## 5      R 135

Note the issue above. The bottom depth of the A horizon and the upper depth of the Bt1 horizon should be the same: either 30 or 38 cm. The correct depth needs to be determined.

  • Missing lower horizon depths. Offending horizons are fixed by replacing the missing bottom depth with the top depth plus 2 cm. In the case of the profile shown above, a bottom depth of 137 cm would be inserted where the depth is missing.

  • Sites missing pedon records. Data without corresponding horizons are not loaded.

3.3.1.1 How can you find the site ID’s where these errors occur and fix them in NASIS?

If errors in the pedon data are detected when loading data using fetchNASIS(), the following “get” functions can trace them back to the corresponding records in NASIS:

  • get(‘sites.missing.pedons’, envir=soilDB.env)
    • Returns user site ID’s for sites missing pedons.
  • get(‘dup.pedon.ids’, envir=soilDB.env)
    • Returns pedon ID’s for sites with duplicate pedon ID’s.
  • get(‘bad.pedon.ids’, envir=soilDB.env)
    • Returns user pedon ID’s for pedons with inconsistent horizon depths.
  • get(‘bad.horizons’, envir=soilDB.env)
    • Returns a dataframe of horizon-level information for pedons with inconsistent horizon depths.

For more information on the design of soilDB functions, see the following documentation: Introduction to soilDB.

3.3.2 fetchNASIS() arguments

fetchNASIS() is a versatile function have has many arguments (i.e. options) that can be chosen.

  • from = ‘pedons’ or ‘components’ or ‘pedon_report’
    • This option allows you to select which data you want to load from NASIS. Choosing either ‘pedons’ or ‘components’ will load data from your local database. If ‘pedon_report’ is specified then it will load data from the text file generated by the NASIS report ‘fetchNASIS’, which is handy for loading more than 5000 pedons at one time, such as for an entire MLRA Soil Survey Office.
  • url =https://nasis.sc.egov.usda.gov/OfflineReports/fetchNASIS_04e6ec7d-fab5-4a90-bb88-9b9dc56dfdd8.txt
    • If from = ‘pedon_report’ this option will load data from the URL that is generated when the NASIS report ‘fetchNASIS’ is run offline against the national database. This is particularly useful for loading more than 20,000 pedons at one time, such for an entire Soil Survey Region.
  • SS = TRUE/FALSE
    • The Selected Set (SS) option allows you to choose whether you want the data to load from your current selected set in NASIS or from the local database tables. The default is set to TRUE so if unspecified fetchNASIS() will always load from the data in the selected set.
  • stringAsFactors = TRUE/FALSE
    • This option allows you to select whether to convert strings into factors or not. The default is set to FALSE, which will handle strings as character formats. Manually set this option to TRUE if you wish to handle character strings as factors.
  • rmHzErrors = TRUE/FALSE
    • Setting this value to TRUE (the default) enables checks for horizon depth consistency. Consider setting this argument to FALSE if you aren’t concerned about horizon-depth errors or if you know that your selected set contains many combination horizons (e.g., consisting of E/Bt horizons or similar two-part horizons described individually for the same depth range). Note that any pedons flagged as having horizon-depth errors (rmHzErrors = TRUE) are omitted from the data returned by fetchNASIS().
  • nullFragsAreZero = TRUE/FALSE
    • Setting this value to TRUE (the default) converts null entries for rock fragment volumes to 0. This is typically the right assumption because rock fragment data are typically populated only when observed. If you know that your data contain a combination of omitted information (e.g. no rock fragment volumes are populated) then consider setting this argument to FALSE.
  • soilColorState = ‘moist’ or ‘dry’
    • This option allows you to select whether to treat strings as factors or not. The default is set to FALSE, which will not treat strings as factors.
  • lab = TRUE/FALSE
    • This option allows for loading the data associated with horizons that may be in the phlabresults table. The default is set to FALSE, which will not load records from the phlabresults table.

For more information on the data checks and adjusting the default options to fetchNASIS() function, see the following resource: Tips on Getting Data from NASIS into R.

3.4 Internal Structure of an SoilProfileCollection (SPC) Object

3.4.1 The Gopheridge Sample Dataset

The gopheridge sample dataset is very similar to the type of data returned from fetchNASIS(). The following demonstration shows the structure of the Soil Profile Collection (SPC) object that is returned by fetchNASIS().

Before proceeding, you may find it helpful to review the following: SoilProfileCollection Object Introduction. This tutorial provides an excellent overview of how the SPC object is constructed. Also, the manual pages for soilDB and aqp are accessible (click index at the bottom of the Help tab in RStudio) by entering the following into the R console:

# not run
library(soilDB)
help(soilDB)

# for links to lots of great examples look here!
library(aqp)
help(aqp)

Open RStudio, and set up the environment by loading packages and the Gopheridge sample dataset.

options(width=95, stringsAsFactors=FALSE)
library(soilDB)
library(aqp)

# load example dataset
data(gopheridge)

# what kind of object is this?
class(gopheridge)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
# what does the internal structure look like?
str(gopheridge, 2)
## Formal class 'SoilProfileCollection' [package "aqp"] with 11 slots
##   ..@ idcol       : chr "peiid"
##   ..@ hzidcol     : chr "phiid"
##   ..@ hzdesgncol  : chr "hzname"
##   ..@ hztexclcol  : chr "texcl"
##   ..@ depthcols   : chr [1:2] "hzdept" "hzdepb"
##   ..@ metadata    :'data.frame': 1 obs. of  2 variables:
##   ..@ horizons    :'data.frame': 317 obs. of  69 variables:
##   ..@ site        :'data.frame': 52 obs. of  88 variables:
##   ..@ sp          :Formal class 'SpatialPoints' [package "sp"] with 3 slots
##   ..@ diagnostic  :'data.frame': 162 obs. of  4 variables:
##   ..@ restrictions:'data.frame': 56 obs. of  8 variables:
# let's take a look at the fields at the site and horizon levels within the SPC
siteNames(gopheridge)
##  [1] "peiid"                "pedon_id"             "siteiid"             
##  [4] "site_id"              "obs_date"             "utmzone"             
##  [7] "utmeasting"           "utmnorthing"          "x"                   
## [10] "y"                    "horizdatnm"           "x_std"               
## [13] "y_std"                "gpspositionalerror"   "describer"           
## [16] "pedonpurpose"         "pedontype"            "pedlabsampnum"       
## [19] "labdatadescflag"      "tsectstopnum"         "tsectinterval"       
## [22] "utransectid"          "tsectkind"            "tsectselmeth"        
## [25] "elev_field"           "slope_field"          "aspect_field"        
## [28] "plantassocnm"         "earthcovkind1"        "earthcovkind2"       
## [31] "erocl"                "bedrckdepth"          "bedrckkind"          
## [34] "bedrckhardness"       "hillslopeprof"        "geomslopeseg"        
## [37] "shapeacross"          "shapedown"            "slopecomplex"        
## [40] "drainagecl"           "geomposhill"          "geomposmntn"         
## [43] "geomposflats"         "slope_shape"          "classdate"           
## [46] "classifier"           "classtype"            "taxonname"           
## [49] "localphase"           "taxonkind"            "seriesstatus"        
## [52] "taxpartsize"          "taxorder"             "taxsuborder"         
## [55] "taxgrtgroup"          "taxsubgrp"            "soiltaxedition"      
## [58] "osdtypelocflag"       "taxmoistcl"           "taxtempregime"       
## [61] "taxfamother"          "psctopdepth"          "pscbotdepth"         
## [64] "selection_method"     "ecositeid"            "ecositenm"           
## [67] "ecositecorrdate"      "es_classifier"        "es_selection_method" 
## [70] "ochric.epipedon"      "argillic.horizon"     "lithic.contact"      
## [73] "cambic.horizon"       "paralithic.contact"   "mollic.epipedon"     
## [76] "paralithic.materials" "surface_fgravel"      "surface_gravel"      
## [79] "surface_cobbles"      "surface_stones"       "surface_boulders"    
## [82] "surface_channers"     "surface_flagstones"   "surface_paragravel"  
## [85] "surface_paracobbles"  "landform_string"      "pmkind"              
## [88] "pmorigin"
horizonNames(gopheridge)
##  [1] "peiid"                "phiid"                "hzname"              
##  [4] "genhz"                "hzdept"               "hzdepb"              
##  [7] "clay"                 "silt"                 "sand"                
## [10] "fragvoltot"           "texture"              "texcl"               
## [13] "lieutex"              "phfield"              "effclass"            
## [16] "labsampnum"           "rupresblkdry"         "stickiness"          
## [19] "plasticity"           "ksatpedon"            "texture_class"       
## [22] "d_r"                  "d_g"                  "d_b"                 
## [25] "d_hue"                "d_value"              "d_chroma"            
## [28] "d_sigma"              "m_r"                  "m_g"                 
## [31] "m_b"                  "m_hue"                "m_value"             
## [34] "m_chroma"             "m_sigma"              "moist_soil_color"    
## [37] "dry_soil_color"       "fine_gravel"          "gravel"              
## [40] "cobbles"              "stones"               "boulders"            
## [43] "channers"             "flagstones"           "parafine_gravel"     
## [46] "paragravel"           "paracobbles"          "parastones"          
## [49] "paraboulders"         "parachanners"         "paraflagstones"      
## [52] "unspecified"          "total_frags_pct_nopf" "total_frags_pct"     
## [55] "art_fgr"              "art_gr"               "art_cb"              
## [58] "art_st"               "art_by"               "art_ch"              
## [61] "art_fl"               "art_unspecified"      "total_art_pct"       
## [64] "huartvol_cohesive"    "huartvol_penetrable"  "huartvol_innocuous"  
## [67] "huartvol_persistent"  "soil_color"           "hzID"

3.4.1.1 Quickly generate sketches from a SoilProfileCollection object

The plot() function applied to a SoilProfileCollection object generates sketches based on horizon depths, designations, and colors. The fetchNASIS() function automatically converts moist Munsell colors into R-style colors. Multiple colors per horizon are mixed. See ?plotSPC for a detailed list of arguments and examples.

par(mar=c(1,1,1,1))
# ommiting pedon IDs and horizon designations
plot(gopheridge, print.id=FALSE, name='', width=0.3)
title('Pedons from the `gopheridge` sample dataset', line=-0.5)

Additional documentation and examples can be found in:

3.4.1.2 Follow along with your own data

Explore the site- and horizon-level data in your own SPC using the following code. Note: You must have pedons in your local NASIS selected set.

# load required libraries
library(soilDB)
library(aqp)

# load data from a NASIS selected set
pedons <- fetchNASIS(from = 'pedons')

# what kind of object is this?
class(pedons)

# how many pedons
length(pedons)

# let's take a look at the fields at the site and horizon levels within the SPC
siteNames(pedons)
horizonNames(pedons)

# look at the first 2 rows of site and horizon data
head(site(pedons), 2)
head(horizons(pedons), 2)

How can you find out how many site and horizon records are in the data you just loaded?

3.5 Viewing Pedon Locations

3.5.1 Plotting Geographic Data Directly in R

Quick check: Does the data plot roughly where you expect it?

Plotting the data directly as an R graphic can give you some idea of how the data look spatially and whether their distribution approximates what you expect. Typos are relatively common when coordinates are manually entered. Viewing the data spatially is a quick way to see if any points plot far outside of the geographic area of interest and therefore clearly have an error.

# plot the locations of the gopheridge pedons within R
# Steps:
# 1) subset to a new data frame
# 2) create a spatial points data frame (SPDF)
# 3) plot the data

# load libraries
library(sp)
library(mapview)

data("gopheridge")

# subset standard WGS84 decimal degree coordinates from the gopheridge SPC by specifying column names
gopher.locations <- site(gopheridge)

# initialize coordinates in an SPDF
coordinates(gopher.locations) <- ~ x_std + y_std
# define coordinate system
proj4string(gopher.locations) <- '+proj=longlat +datum=WGS84'

# creat interactive map
mapview(gopher.locations, legend=FALSE, map.types='OpenStreetMap', label=gopher.locations$site_id)

3.5.2 Displaying Pedon Data in Google Earth

Google Earth is a powerful viewer for point data. Geographic data is displayed in Google Earth using the Keyhole Markup Language (KML) format. Using the plotKML package, you can easily create a KML file to inspect and view in Google Earth. See the related material in this tutorial: Export Pedons to Google Earth.

3.5.3 Exporting Pedon Data to an ESRI Shapefile

Another way you can view the data is to export a shapefile from R. For further information, see this tutorial: Export Pedons to Shapefile.

3.5.3.1 Follow along with your own data

Use the script below to make an R plot of pedon data loaded from your NASIS selected set.

The following script plots the standard lat/long fields from NASIS. In some cases, these fields might be incomplete due to insufficient data or to not having been calculated from UTM coordinates in NASIS. In these cases, you can omit sites with “NA” values in the coordinates a couple of ways. The na.omit() or complete.cases() functions remove any rows in a dataframe that have “NA” values.

Run the following script on the data loaded from your local NASIS selected set:

# load libraries
library(aqp)
library(soilDB)
library(sp)
library(mapview)

# get pedons from the selected set
pedons <- fetchNASIS(from = 'pedons')

# subset standard WGS84 decimal degree coordinates from the 
# gopheridge SPC by specifying column names
pedons.sp <- site(pedons)[, c('site_id', 'x_std', 'y_std')]
nrow(pedons.sp)

# remove any sites lacking standard lat/long coordinates
# notice that there may now be fewer rows of data
pedons.sp <- na.omit(pedons.sp)
nrow(pedons.sp)

# initialize coordinates in an SPDF
coordinates(pedons.sp) <- ~ x_std + y_std
# define coordinate system
proj4string(pedons.sp) <- '+proj=longlat +datum=WGS84'

# plot
mapview(pedons.sp, legend=FALSE, map.types='OpenStreetMap', label=pedons.sp$site_id)

4 Working with SPC Data in R

4.1 Summarizing Data

Now that you’ve loaded some data, you can look at additional ways to summarize data elements and filter the SPC to specific sites of interest. The table() function is very useful for quick summary operations. This function can be combined with other functions, such as sort() and is.na() or !is.na() (is not NA). Follow along with your own data.

# summarize which soil taxa we have loaded
table(pedons$taxonname)
# sort results in descending order
sort(table(pedons$taxonname), decreasing=TRUE)

# could do the same thing for taxonomic subgroups or any column of the SPC at the site or horizon levels
table(pedons$taxsubgrp)
sort(table(pedons$taxsubgrp), decreasing=TRUE)

# table() is also useful when testing for null data using IS NA, is.na() or IS NOT NA, !is.na()
table(is.na(pedons$taxsubgrp))
table(!is.na(pedons$taxsubgrp))

# it can also be applied to horizon level columns in the SPC
sort(table(pedons$texture), decreasing=TRUE)

4.2 Filtering

A variety of methods are available to subset a collection of soil profiles or an SPC. The results can then be placed into another SPC. This capacity can be useful for generating subset SPC objects from the original dataset.

4.2.1 Logical Operators

  • %in% Equivalent to IN () in SQL. Can use c() to concatenate lists of vectors.
    • Example: pedons$taxpartsize %in% c('loamy-skeletal', 'sandy-skeletal')
  • != Not-equal-to character “string.”
  • == Note in the example above that R uses a double equal sign as “equal to.”
  • <, >, <=, >= Less than, greater than, less than or equal to, and greater than or equal to.

4.2.2 Pattern Matching

The following examples use the grep() function to pattern match within the data, create an index of the SPC for records that match the specified pattern within that column, and then use that index to filter to specific sites and their corresponding profiles. Patterns are specified in regular expression (REGEX) syntax.

This process can be applied to many different columns in the SPC based on how you need to filter the data. This example pattern matches on the tax_subgroup column, but another useful application might be to pattern match on geomorphology or parent material.

Note that the grep() below also has an invert option, which is specified as either true or false (the default is false). When set to true, this option is very useful for excluding the results of the pattern matching process by inverting the selection.

# say we wanted to look at what the variation of particle size classes are within a specific subgroup?
# use of grep() to filter and create an index, then apply that index to the SPC 
# and create a new SPC called 'f1' using the square bracket notation
idx <- grep('lithic', pedons$taxsubgrp, invert=FALSE)
# save this subset of 'lithic' soils for later use  
subset1 <- pedons[idx, ]
# or use the index directly to summarize a field
sort(table(pedons$taxpartsize[idx]), decreasing=TRUE)

Do a quick graphical check to ensure that the “lithic” profiles are selected. Plot them in R using the SoilProfileCollection “plot” method (e.g., specialized version of the generic plot() function).

# adjust margins
par(mar=c(1,0,0,1))
# plot the first 10 profiles of subset1
# limit plotting to a depth of about 60cm
plot(subset1[1:10, ], label='site_id', max.depth=60)
title('Pedons with the word "lithic" at subgroup-level of Soil Taxonomy', line=-2)

For more information on using regular expressions in grep() for pattern matching operations, see: Regular-expression-syntax.

4.2.2.1 Additional syntax options for REGEX pattern matching

  • | Equivalent to “or” in SQL.
    • Example: grep('loamy | sandy', pedons$taxpartsize)
  • ^ Anchors to the left side of the string.
    • Example: grep('^sandy', pedons$taxpartsize).
  • $ Anchors to the right side of the string.
    • Example: grep('skeletal$', pedons$taxpartsize).

4.2.3 Filtering Data by Specifying a Criteria Using the which() Function

Another method of subsetting a collection of soil profiles is to specify a criteria using the which() function. The following examples use the which() and grep() functions to reference the indexing of the SPC to create subsets and to filter for specific sites or their corresponding profiles.

# say we wanted to look at what the variation of particle size classes are within a specific subgroup?
# first: use grep to pattern match the tax_subgroup field for the string 'aqu'
idx <- grep('lithic', pedons$taxsubgrp)
# save this subset
subset1 <- pedons[idx, ]
# check taxonomic range of particle size classes in the data
sort(table(subset1$taxsubgrp), decreasing=TRUE)
sort(table(subset1$taxpartsize), decreasing=TRUE)

# then further query the subset for only those profiles with particle size class of 'sandy-skeletal'
# notice: a double equal sign '==' is used for exact character or numeric criteria
idx <- which(subset1$taxpartsize == 'loamy')
# save this subset
subset2 <- subset1[idx, ]
table(subset2$taxpartsize)
# plot  profiles 1 thru 10
par(mar=c(0,0,2,1))
plot(subset2, label='site_id')
title('Loamy particle size control section class')

4.2.4 Extracting Site and Horizon Data

Soil Profile Collections are designed to be dismantled so they can work more easily with either site or horizon data. The SPC has a slot for site-level data and a slot for horizon-level data. You can reference these slots using the site() and horizons() functions within the AQP package. These “get” functions extract all the site or horizon variables as a dataframe for further use.

# extract site data from SPC into dataframe 's'
s <- site(pedons)
names(s)
# extract horizon data from SPC into dataframe 'h'
h <- horizons(pedons)
names(h)

You can also use these functions when referencing the data within an SPC to specify that you want to look specifically in the site or horizon data.

4.2.5 Review of Data Checks Run by fetchNASIS()

Now that you’ve loaded some data and learned a little about how to filter data in the SPC, you can quickly review some of the get() functions used to track data issues detected in the process of loading data back to the NASIS records in your selected set.

# use each one of these to return a vector of the pedons where errors were detected
#get('sites.missing.pedons', envir=soilDB.env)
#get('dup.pedon.ids', envir=soilDB.env)
#get('bad.pedon.ids', envir=soilDB.env)
# example of pedon_id's returned
#[1] "2011MT0810001" "2011MT0810009" "2011MT0810015" "2011MT0810027" "2011MT0810034"

#get('bad.horizons', envir=soilDB.env)

# How could you then remove these from your SPC?
# since the get() returns the string of bad pedon id's we can use a which() to query any pedon id's that don't match the bad id's
idx <- which(! pedons$pedon_id %in% get('bad.pedon.ids', envir=soilDB.env))
pedons <- pedons[idx, ]

Another useful function is dput(), which concatenates a variable. It converts something like this:

“2011MT0810001” “2011MT0810009” “2011MT0810015” “2011MT0810027” “2011MT0810034”

into a comma delimited string like this:

c(“2011MT0810001”, “2011MT0810009”, “2011MT0810015”, “2011MT0810027”, “2011MT0810034”).

Such a string can be copied and then pasted back as a concatenated string or could even be used as string for NASIS list queries. The dput() function is also helpful when sending questions or examples to colleagues via email.

4.3 Getting Additional Data: Extended Data Functions

Additional data related to both site and horizon information can be fetched using the get_extended_data_from_NASIS() function. This data is not automatically brought into R because these data elements are typically related to the site or horizon data in one-to-many relationships. For example, multiple diagnostic features could exist within one pedon. Below is a summary of additional information that can be readily brought into R from your NASIS selected set via the get_extended_data_from_NASIS() function.

# fetch extended site and horizon data
e <- get_extended_data_from_NASIS_db()

### site and pedon related extended data
# vegetation data summary
colnames(e$ecositehistory) 

# diagnostic features
colnames(e$diagnostic) 

# surface rock fragments
colnames(e$surf_frag_summary)

# geomorphic description
colnames(e$geomorph)

# taxonomic history data
colnames(e$taxhistory)

# linked photo stored in site textnotes
colnames(e$photo) 

# site parent materials
colnames(e$pm)

### horizon related extended data
# rock fragments 
colnames(e$frag_summary) 

# soil texture modifers
colnames(e$texmodifier)

# soil structure data
colnames(e$struct) 

The “Geomorphic description” and “parent materials” attributes are important soil data. They can be useful handles for exploring other data. The soilDB package flattens the nested table structure of parent material and geomorphic description within NASIS into single strings for each site-level record. The pattern matching concepts demonstrated above select profiles based on parts of these strings. The following code generates a simple graphical summary of the 10 most commonly occurring landforms in fetchNASIS() data so you can see their frequency of occurrence.

# graphically tabulate the occurrence of landforms
# load required libraries
library(soilDB)
# required for dotchart2()
library(Hmisc)

# load data from a NASIS selected set
pedons <- fetchNASIS(from = 'pedons')

# create 'lf' object of landform factors sorted in descending order
lf <- sort(table(pedons$landform_string), decreasing = TRUE)

# plot top 10 or length, whichever is shorter
dotchart2(lf[1:pmin(10, length(lf))], col='black', xlim = c(0, max(lf)), cex.labels = 0.75)

4.3.1 Deriving Thicknesses of Diagnostic Features

4.3.1.1 Boolean diagnostic feature columns in the site data

If diagnostic features are populated in the pedon diagnostic features table in NASIS, then Boolean (TRUE or FALSE) fields are created for each diagnostic feature in the data brought into R by soilDB. These fields can be readily used to model the presence or absence of a diagnostic soil feature by extracting the site data.

You could use the following code to pull the upper depth to calcium carbonates using the “calcic horizon” and/or the “secondary carbonates” diagnostic features. However, data consistency is critical. You can only use those fields if they are consistently populated for all pedons that you are working with in your selected set. As you start working with larger pedon data sets, you will quickly find that there can be great inconsistencies in the way the data were populated by different people in different offices on different surveys over different time frames.

The following is an example of how you could use the diagnostic features (if populated!) from the extended data to determine the thickness of a diagnostic feature of interest.

# rename gopheridge data
data("gopheridge")
f <- gopheridge

# get diagnostic features associated with pedons loaded from selected set
d <- diagnostic_hz(f)

# summary of the diagnostic features in your data!
unique(d$featkind)
## [1] ochric epipedon      argillic horizon     lithic contact       cambic horizon      
## [5] paralithic contact   mollic epipedon      paralithic materials <NA>                
## 84 Levels: anthropic epipedon abrupt textural change andic soil properties ... manufactured layer contact
sort(table(droplevels(factor(d$featkind))), decreasing = TRUE)
## 
##      ochric epipedon     argillic horizon       lithic contact   paralithic contact 
##                   51                   44                   33                   19 
##       cambic horizon      mollic epipedon paralithic materials 
##                   12                    1                    1
# subset argillic horizons
d <- d[d$featkind == 'argillic horizon', ]

# create a new column and subtract the upper from the lower depth
d$argillic_thickness_cm <- d$featdepb - d$featdept

# create another new column with the upper depth to the diagnostic feature
d$depth_to_argillic_cm <- d$featdept

# omit NA values
d <- na.omit(d)

# subset to pedon records IDs and calculated thickness
d <- d[, c('peiid', 'argillic_thickness_cm', 'depth_to_argillic_cm')]
head(d)
##     peiid argillic_thickness_cm depth_to_argillic_cm
## 2  242808                    63                   18
## 5  252851                    50                   18
## 7  268791                    43                   15
## 10 268793                    71                   10
## 13 268794                    50                    5
## 16 268795                    46                   15
# join these data with existing site data
site(f) <- d

# plot as histogram
# reset figure margins
par(mar = c(4.5,4.5,1,1))
hist(f$argillic_thickness_cm, xlab = 'Thickness of argillic diagnostic (cm)', main='')

hist(f$depth_to_argillic_cm, xlab = 'Depth to argillic diagnostic (cm)', main = '')

4.3.1.2 Follow along with your own data

# start fresh with your own data
f <- fetchNASIS(from = 'pedons')
# get diagnostic features associated with pedons loaded from selected set
d <- diagnostic_hz(f)
# summary of the diagnostic features in your data!
unique(d$featkind)
# top 5 most frequent
sort(table(d$featkind), decreasing = TRUE)[1:5]

# subset argillic horizons - or choose your own diagnostic feature and modify this script!
#idx <- which(d$diag_kind == 'your_diagnostic')
#d <- d[idx, ]

# how would you do the rest.....see if you can work it out!

What can you do with the Boolean diagnostic feature data?

4.3.2 Diagnostic Feature Diagrams

## work up diagnostic plot based on gopheridge dataset
library(aqp)
library(soilDB)
library(sharpshootR)

# load data
data(gopheridge)

# can limit which diagnostic features to show by setting 'v' manually
v <- c('ochric.epipedon', 'cambic.horizon', 'argillic.horizon', 'paralithic.contact', 'lithic.contact')

# generate diagnostic property diagram
diagnosticPropertyPlot(gopheridge, v, k=5, grid.label='site_id', dend.label = 'taxonname', sort.vars = FALSE)

# plot again, this time with diagnostic features ordered according to co-occurrence
diagnosticPropertyPlot(gopheridge, v, k=5, grid.label='site_id', dend.label = 'taxonname', sort.vars = TRUE)

4.3.2.1 Follow along with your own data

Use the following script to generate a diagnostic-feature diagram for the pedon data you’ve loaded from your NASIS selected set. Note: If the data includes more than about 20 pedons, the script might generate figures that are very hard to read. You also need to be certain that pedon diagnostic feature were populated in your data.

library(soilDB)
library(sharpshootR)

# load data
f <- fetchNASIS(from = 'pedons')

# may need to subset to a particular series or taxa here....to reduce the number of pedons!

# select a series of diagnostic properties or automatically pull diagnostic feature columns
# get all diagnostic feature columns from site data by pattern matching on '[.]' in the colnames
idx <- grep('[.]', colnames(site(f)))
v <- colnames(site(f))[idx]
v

# or insert diagnostics of interest found in your data here from the list of possible diagnostics in 'v'
v <- c('ochric.epipedon', 'cambic.horizon', 'argillic.horizon', 'paralithic.contact', 'lithic.contact')

# generate diagnostic property diagram
diagnosticPropertyPlot(f, v, k=5, grid.label='site_id', dend.label = 'taxonname')

For more information on generating diagnostic feature diagrams, see the following tutorial: Diagnostic Feature Property Plots.

4.4 Customized Queries to Local NASIS Database

There are times when it is necessary to send customized queries to the local NASIS database. Queries are written in T-SQL which is the dialect of SQL used to communicate with Microsoft SQL Servers (e.g. the local NASIS database). The fetchNASIS and related convenience functions are essentially wrappers around commonly used chunks of SQL.

The following example will return all records from the sitesoiltemp table, along with a couple of fields from the site, siteobs, and pedon tables. This is a convenient way to collect all of the field-based soil temperature data associated with the pedons in your selected set for further analysis.

library(soilDB)
library(RODBC)

# write query as a long text object
q <- "
-- columns to return
SELECT siteiid as siteiid, peiid, usiteid as site_id, upedonid as pedon_id, obsdate as obs_date,
soitemp, soitempdep

FROM
-- tables that are queried and join conditions
site_View_1 
INNER JOIN siteobs_View_1 ON site_View_1.siteiid = siteobs_View_1.siteiidref
LEFT OUTER JOIN sitesoiltemp_View_1 ON siteobs_View_1.siteobsiid = sitesoiltemp_View_1.siteobsiidref
LEFT OUTER JOIN pedon_View_1 ON siteobs_View_1.siteobsiid = pedon_View_1.siteobsiidref
-- ordering of rows
ORDER BY obs_date, siteiid;"

# setup connection local NASIS
channel <- odbcDriverConnect(connection = getOption("soilDB.NASIS.credentials"))

# exec query
d <- sqlQuery(channel, q, stringsAsFactors=FALSE)

# close connection
odbcClose(channel)

# check results
str(d)

# remove records missing values
d <- na.omit(d)

# tabulate unique soil depths
table(d$soitempdep)

# extract doy of year
d$doy <- as.integer(format(d$obs_date, "%j"))

# when where measurements collected?
hist(d$doy, xlim=c(1,366), breaks=30, las=1, main='Soil Temperature Measurements', xlab='Day of Year')

# soil temperature by day of year
plot(soitemp ~ doy, data=d, type='p', xlim=c(1, 366), ylim=c(-1, 25), xlab='Day of Year', ylab='Soil Temperature at 50cm (deg C)', las=1)

This document is based on aqp version 1.19, soilDB version 2.5.1, and sharpshootR version 1.6.